Back

European Radiology

Springer Science and Business Media LLC

Preprints posted in the last 90 days, ranked by how well they match European Radiology's content profile, based on 11 papers previously published here. The average preprint has a 0.14% match score for this journal, so anything above that is already an above-average fit.

1
SCOPE: AI-Assisted Early Detection of Potentially Curable Pancreatic Neoplasms on CT from Local and Global Information

Oviedo, F.; Lopez Ramirez, F.; Blanco, A.; Facciola, J.; Kwak, S.; Zhao, J. M.; Syailendra, E. A.; Tixier, F.; Dodhia, R.; Hruban, R. H.; Weeks, W. B.; Lavista Ferres, J. M.; Chu, L. C.; Fishman, E. K.

2026-02-05 radiology and imaging 10.64898/2026.02.04.26345495
Top 0.1%
99× avg
Show abstract

PurposeTo develop SCOPE (Small-lesion COntextual Pancreatic Evaluator), a deep learning model designed to improve CT detection of small pancreatic lesions--pancreatic ductal adenocarcinoma (PDAC), pancreatic neuroendocrine tumors (PanNETs), and cystic lesions--by integrating voxel-level features with global context. Materials and MethodsThis retrospective study used three independent datasets. A development cohort of 4,065 contrast-enhanced CT scans was used to train a deep neural network that performs pancreas, ductal, and lesion segmentation with an integrated classification head. A metamodel combined segmentation-derived and global contextual signals for case-level prediction. Performance was assessed on (1) an internal holdout test set (n = 605), (2) an external multi-institutional PDAC dataset from the PANORAMA challenge (n = 2,238), and (3) an expert-curated small-lesion reader study (n = 200). Areas under the receiver operating characteristic curve (AUCs) were compared using DeLong test; sensitivities and specificities using McNemars test. ResultsOn the internal test set, SCOPE improved lesion-versus-normal AUC compared with the best segmentation baseline (0.974 [95% CI: 0.964, 0.984] vs 0.956; P = .006) and increased small-lesion sensitivity at 95% specificity (0.727 [95% CI: 0.653, 0.801] vs 0.600; P = .012). Performance gains were observed across lesion classes, with significant improvements for PDAC and PanNET detection. On the external dataset, SCOPE improved PDAC-versus-non-PDAC AUC (0.978 vs 0.861, P < .001) and achieved higher sensitivity at 90% and 95% specificity without retraining. For the small-lesion reader study, SCOPE achieved lesion-versus-normal AUC of 0.922 and performed within the range of subspecialty abdominal radiologists; SCOPE provided the correct diagnosis in 14.5% (29/200) of cases in which two or more readers were incorrect. ConclusionSCOPE improves early detection of small, potentially curable, pancreatic lesions on CT by combining local segmentation and global pancreatic context. Its consistent performance across internal, external, and reader datasets supports potential use as a concurrent reader for earlier and more accurate pancreatic lesion detection.

2
Open-Source Offline-Deployable Retrieval-Augmented Large Language Model for Assisting Pancreatic Cancer Staging

Johno, H.; Amakawa, A.; Komaba, A.; Tozuka, R.; Johno, Y.; Sato, J.; Yoshimura, K.; Nakamoto, K.; Ichikawa, S.

2026-01-01 radiology and imaging 10.64898/2025.12.26.25343050
Top 0.2%
92× avg
Show abstract

PurposeLarge language models (LLMs) are increasingly applied in radiology, but key challenges remain, including data leakage from cloud-based systems, false outputs, and limited reasoning transparency. This study aimed to develop an open-source, offline-deployable retrieval-augmented LLM (RA-LLM) system in which local execution prevents data leakage and retrieval-augmented generation (RAG) improves output accuracy and transparency using reliable external knowledge (REK), demonstrated in pancreatic cancer staging. Materials and MethodsLlama-3.2 11B and Gemma-3 27B were used as local LLMs, and GPT-4o mini served as a cloud-based comparator. The Japanese pancreatic cancer guideline served as REK. Relevant REK excerpts were retrieved to generate retrieval-augmented responses. System performance, including classification accuracy, retrieval metrics, and execution time, was evaluated on 100 simulated pancreatic cancer CT cases, with non-RAG LLMs as baselines. McNemar tests were applied to TNM staging and resectability classification. ResultsRAG improved TNM staging accuracy for all LLMs (GPT-4o mini 61%[-&gt;]90%, p<0.001; Llama-3.2 11B 53%[-&gt;]72%, p<0.001; Gemma-3 27B 59%[-&gt;]87%, p<0.001) and mildly improved resectability classification (72%[-&gt;]84%, p=0.012; 58%[-&gt;]73%, p=0.006; 77%[-&gt;]86%, p=0.093), with Gemma-3 27B showing performance comparable to GPT-4o mini. Retrieval performance was high (context recall = 1; context precision = 0.5-1), and local models ran at speeds comparable to the cloud-based GPT-4o mini. ConclusionWe developed an offline-deployable RA-LLM system for pancreatic cancer staging and publicly released its full source code. RA-LLMs outperformed baseline LLMs, and the offline-capable Gemma-3 27B performed comparably to the widely used cloud-based GPT-4o mini.

3
The Effect of AI on the Radiologist Workforce: A Task-Based Analysis

Langlotz, C. P.

2025-12-22 radiology and imaging 10.64898/2025.12.20.25342714
Top 0.3%
79× avg
Show abstract

BackgroundThe effect of AI algorithms on the radiology workforce has been a subject of commentary and controversy. There is now sufficient published evidence to support a quantitative task-based analysis to predict these effects. PurposeTo construct a quantitative, task-based model to predict the effect of AI on the radiology workforce using the best available evidence. Materials and MethodsWe reviewed the literature to establish the tasks on which radiologists spend their time. We then developed categories of AI applications that could affect these tasks. We used published evidence to estimate the effect of each AI application on each radiology task using a 5-year time horizon. When published evidence was unavailable, we used our own judgment. ResultsThe model projects a 33% reduction in hours worked by radiologists in 5 years, with a range of 14% to 49%. The main effects are due to radiology report drafting for all modalities and study delegation for radiography and mammography. ConclusionAI applications likely will cause a significant decrease in radiologist hours worked.. Given the relatively static radiology workforce and the continued growth in imaging volumes, radiologist job loss is unlikely for the foreseeable future.

4
AI-based radiomics for pancreatic cysts: high diagnostic performance amid a persistent translational gap

Lettner, J. D.; Evrenoglou, T.; Binder, H.; Fichtner-Feigl, S.; Neubauer, C.; Ruess, D. A.

2026-02-12 radiology and imaging 10.64898/2026.02.10.26345995
Top 0.4%
76× avg
Show abstract

BackgroundAI-based radiomics has demonstrated promising diagnostic performance for pancreatic cystic neoplasms, yet clinical translation remains limited. Whether this reflects insufficient model performance or structural limitations of the evidence base remains unclear. MethodsWe performed a systematic review and diagnostic test accuracy meta-analysis of AI-based radiomics in pancreatic cyst (2015-2025), addressing two clinically relevant tasks (Q1: cyst type differentiation/Q2: malignancy or high-grade dysplasia prediction). Training and validation datasets were synthesized independently using hierarchical models. Study evaluation extended beyond diagnostic performance to a four-dimensional framework integrating RQS 2.0, METRICS, TRIPOD+AI and PROBAST+AI explicitly contrasting pooled diagnostic performance with reporting quality, methodological rigor, and risk of bias. The review was pre-registered (PROSPERO) and conducted according to PRISMA 2020. ResultsTwenty-nine studies were included (Q1: n = 15; Q2: n = 14), predominantly retrospective and single center. Training-based analyses showed high apparent diagnostic performance for Q1 (pooled sensitivity/specificity: 0.89 [95% CI, 0.85-0.92]/ 0.90 [0.85-0.93]), but there was substantial heterogeneity ({tau}{superscript 2} = 0.56/0.78; {rho} = 0.38). Validation-based performance remained high (0.86 [0.82-0.89]/ 0.88 [0.81-0.93]), while heterogeneity persisted and prediction regions exceeded confidence regions. Training-based analyses demonstrated similarly high apparent performance (0.88 [0.79-0.95]/0.89 [0.81-0.94]) for Q2, with pronounced heterogeneity ({tau}{superscript 2} = 1.98/1.61; {rho} = 0.63). Validation-based performance was slightly lower, yet still clinically comparable (0.82 [0.75-0.89]/0.86 [0.80-0.91]), and heterogeneity persisted ({tau}{superscript 2} = 0.71/0.43; {rho} = 0.15). Across both tasks, high diagnostic accuracy occurred alongside incomplete reporting, limited validation and an elevated risk of bias. ConclusionAI-based radiomics for pancreatic cysts has reached a structural performance plateau. Further improvements in diagnostic accuracy alone are insufficient to achieve clinical translation and must be accompanied by a paradigm shift from performance-driven model development toward decision-anchored study designs, robust validation strategies, transparent reporting standard, and clinically integrated evaluation frameworks. SummaryAlthough pancreatic cystic lesions are increasingly being detected, imaging-based decision-making remains limited, particularly regarding differentiating between cyst types and stratifying malignancy risk. In this PRISMA-compliant and PROSPERO-registered systematic review and meta-analysis of diagnostic tests, we evaluated the use of AI-based radiomics for these two tasks, as well as its contextualized performance. In addition, a four-dimensional framework was employed to conduct the evaluation, incorporating diagnostic accuracy, reporting quality, risk of bias, and radiomics maturity. Across studies published between 2015 and 2025, the pooled diagnostic performance was consistently high, with only modest declines observed from the training to the validation stage. Nevertheless, considerable heterogeneity between studies and limited transportability remained evident. Multidimensional evaluation indicated a systematic dissociation between reported performance and methodological robustness, characterized by incomplete reporting, restricted validation, and an elevated risk of bias. These limitations were consistent across both clinical questions and were not resolved by increasing model complexity. The findings of this meta-analysis suggest that the structural performance of AI-based radiomics for pancreatic cysts has plateaued. To progress towards clinical translation, it is necessary to employ study designs anchored in decision-making processes, robust multi-center validation, and transparent, reproducible evaluation frameworks. This is preferred to further optimization of model architecture alone.

5
Nationwide Organ Volume Reference Standards and Aging-Related Changes in Abdominal CT from Japan

Kikuchi, T.; Yamamoto, K.; Yamagishi, Y.; Akashi, T.; Hanaoka, S.; Yoshikawa, T.; Fujii, H.; Mori, H.; Makimoto, H.; Kohro, T.

2026-02-03 radiology and imaging 10.64898/2026.01.30.26345246
Top 0.4%
76× avg
Show abstract

BackgroundLarge-scale CT-based reference standards for abdominal organ volume, incorporating age, sex, and body size, are limited. PurposeTo establish sex- and age-specific reference distributions for major abdominal organ volumes on non-contrast abdominopelvic CT in a nationwide Japanese cohort to provide a foundation for automated clinical assessment and dose optimization. Materials and MethodsIn this retrospective, multicenter study, using the Japan Medical Image Database, we identified all non-contrast abdominopelvic CT examinations performed in 2024. Unique adults with available data on age, sex, height, and weight were included in this study. The final sample comprised 49,764 examinations (26,456 men and 23,308 women) conducted at nine institutions. Automated segmentation (TotalSegmentator v2.10.0) was used to produce organ volumes, excluding hollow viscera. The sex-specific 10th, 25th, 50th, 75th, and 90th percentiles were calculated. Age-volume relationships of body surface area (BSA)-normalized volumes (mL/m2) were modeled using natural cubic splines (four degrees of freedom) separately by sex. ResultsMedian (mL) male vs female volumes were as follows: liver, 1194.7 vs 1024.0; pancreas, 63.6 vs 52.2; spleen, 118.1 vs 95.1; kidneys (total), 268.3 vs 221.2; adrenals (total), 6.6 vs 4.2; iliopsoas (total), 483.4 vs 317.7; prostate, 24.9 (men only). Age-volume relationships of BSA-normalized volumes showed convex patterns for the liver, pancreas, and kidneys in both sexes and for male adrenal glands; lower values in older age groups for the spleen and iliopsoas in both sexes; and higher values in older age groups for the prostate and female adrenal glands. ConclusionThis nationwide Japanese CT cohort provides sex- and age-resolved volumetric reference standards. These standards enable objective identification of abnormalities, support personalized medicine, and facilitate automated AI-based reporting to reduce radiologist workload and optimize radiation dose protocols. Key ResultsO_LIMedian volumes (men vs women, mL): liver 1195/1024; pancreas 64/52; spleen 118/95; kidneys 268/221; adrenals 6.6/4.2; iliopsoas 483/318; prostate 25. C_LIO_LIBody surface area-normalized age-volume relationships were convex for liver, pancreas, and kidneys in both sexes and for male adrenal glands. C_LIO_LISpleen and iliopsoas declined monotonically with age in both sexes, whereas prostate and female adrenal glands increased monotonically. C_LI

6
Accuracy And Generalizability of an Open-Source Deep Learning Model For Facial Bone Segmentation on CT and CBCT Scans

Gkantidis, N.; Ghamri, M.; DOT, G.

2025-12-29 dentistry and oral medicine 10.64898/2025.12.28.25343101
Top 0.5%
60× avg
Show abstract

AimTo evaluate the accuracy and generalizability of DentalSegmentator, an open-source deep learning tool, for automated segmentation of skeletal facial surfaces from computed tomography (CT) scans acquired under different imaging conditions. Materials and MethodsTen human skulls were scanned using a CT scanner and three cone beam CT (CBCT) protocols (including an ultra-low-dose protocol) on two CBCT devices. High-accuracy reference surface models were acquired using an optical scanner. CBCT an CT scans were segmented automatically using DentalSegmentator. Three facial regions (forehead, zygomatic process, maxillary process) were defined on each model for quantitative assessment. Accuracy was measured as the mean absolute distance (MAD) and the standard deviation of absolute distances (SDAD) between segmented and reference models after best-fit superimposition. ResultsRepeated segmentations were identical, confirming perfect reproducibility. Across all acquisition settings and regions, DentalSegmentator produced highly accurate skeletal surface models, with an overall MAD of 0.088 mm (IQR 0.073) and SDAD of 0.061 mm (IQR 0.028). Significant but small differences were detected between imaging systems (MAD: p < 0.001; SDAD: p = 0.003), with CT scans showing slightly reduced trueness compared with CBCT images. ConclusionThe open-source DentalSegmentator tool produced accurate skeletal facial surface segmentations across diverse CT and CBCT settings, demonstrating excellent generalizability, including under low-radiation conditions. Minor differences in trueness between imaging systems were small and unlikely to impact clinical or research use. Clinical SignificanceDeep learning offers a robust foundation for automated 3D craniofacial surface extraction, supporting broader adoption of AI-driven workflows in both clinical and research contexts.

7
End-to-End PET/CT Interpretation and Quantification with an LLM-Orchestrated AI Agent: A Real-World Pilot Study

Choi, H.; Bae, S.; Na, K. J.

2026-02-25 radiology and imaging 10.64898/2026.02.21.26346798
Top 0.5%
58× avg
Show abstract

BackgroundAlthough deep learning models have improved individual PET analysis, image processing and quantification tasks, end-to-end automation from raw DICOM to quantitative clinical reporting remains limited, particularly in heterogeneous real-world settings. MethodsAs a proof-of-concept, an autonomous large language model (LLM)-orchestrated multi-tool agent for end-to-end PET/CT interpretation was developed. A reasoning-based text LLM selected appropriate series from raw DICOM, coordinated registration and SUV conversion, invoked segmentation and detection tools, generated maximum-intensity projections, called a vision-enabled LLM for interpretation, and synthesized structured draft reports. The system was retrospectively evaluated in 170 patients undergoing baseline FDG PET/CT for lung cancer staging, using expert reports as reference. ResultsThe agent successfully completed the full end-to-end workflow from raw DICOM selection to structured draft report generation without human intervention in all 170 examinations. Primary tumor detection achieved 100% sensitivity. For nodal involvement, sensitivity was 84.8% and specificity was 39.4%, whereas distant metastasis detection showed 70.2% sensitivity and 65.0% specificity. Discrepancy analysis of 58 nodal and 57 metastatic mismatch cases revealed systematic false-positive findings related to reactive or physiologic uptake and false-negative findings involving small-volume or anatomically atypical metastases. ConclusionLLM-orchestrated PET/CT agents can enable workflow-level automation from raw DICOM to quantification and structured draft reporting under real-world conditions. Although primary tumor detection was highly reliable, nodal and metastatic assessment revealed systematic limitations, supporting a collaborative role with continued expert oversight in complex clinical scenarios.

8
UCSF RMaC: University of California San Francisco 3D Multi-Phase Renal Mass CT Dataset with Tumor Segmentations

Sahin, S.; Diaz, E.; Rajagopal, A.; Abtahi, M.; Jones, S.; Dai, Q.; Kramer, S.; Wang, Z.; Larson, P. E. Z.

2026-02-12 radiology and imaging 10.64898/2026.02.11.26346096
Top 0.5%
58× avg
Show abstract

Current standard of care imaging practices cannot reliably differentiate among certain renal tumors such as benign oncocytoma and clear cell renal cell carcinoma (RCC), and between low and high grade RCCs. Previous work has explored using deep learning, radiomics, and texture analysis to predict renal tumor subtypes and differentiate between low and high grade RCCs with mixed success. To further this work, large diverse datasets are needed to improve model performance and provide strong evaluation sets. In this work, a dataset of 831 multi-phase 3D CT exams was curated. Each exam contains up to three contrast-enhanced CT phases. Tumor outlines or bounding boxes were annotated and registered to the image volumes. The pathology results for each tumor and relevant patient metadata are also included.

9
Transfer Learning for Medical Imaging: An Empirical Evaluation of CNN Architectures on Chest Radiographs

Salve, H. S.

2026-01-08 radiology and imaging 10.64898/2026.01.07.26343591
Top 0.5%
58× avg
Show abstract

This paper presents a comprehensive comparative study of five state-of-the-art CNN architectures, VGG19, ResNet50, InceptionV3, DenseNet121, and EfficientNetB0 for multi-class classification of Chest X-ray images (CXR) into four categories: Edema, Normal, Pneumonia, and Tuberculosis (TB). The models were trained, validated, and tested on a dataset comprising 6,092 training and 325 testing images across four distinct classes. Each architecture was initialized with ImageNet weights, augmented with a custom classifier, and fine-tuned under identical conditions to ensure a fair comparison. The models are evaluated on a comprehensive set of metrics, including accuracy, per-class recall, training time, and model complexity. Experimental results indicate that VGG19 achieved the highest classification accuracy of 98.15%, followed closely by ResNet50 at 97.54%. This study provides empirical evidence to guide the selection of appropriate deep learning models for chest X-ray diagnosis, balancing performance with operational constraints

10
Comparative Diagnostic Accuracy of Magnetic Resonance Elastography and Diffusion-Weighted Imaging in Differentiating Benign and Malignant Focal Liver Lesions: A Systematic Review and Meta-Analysis

Hassankhani, A.; Valizadeh, P.; Jannatdoust, P.; Amoukhteh, M.; Mohammadi, A.; Gholamrezanezhad, A.; Haq, A.

2025-12-15 radiology and imaging 10.64898/2025.12.11.25342122
Top 0.5%
57× avg
Show abstract

BackgroundAccurate differentiation of benign and malignant focal liver lesions (FLLs) is essential for clinical decision-making. Magnetic resonance elastography (MRE) and diffusion-weighted imaging (DWI) are advanced MRI techniques used for noninvasive lesion characterization, but their comparative diagnostic performance has not been definitively established. ObjectiveTo systematically compare the diagnostic accuracy of MRE and DWI for distinguishing benign from malignant FLLs. MethodsA systematic review and meta-analysis were conducted following PRISMA guidelines. PubMed, Embase, and Scopus were searched through July 2025 for studies directly comparing MRE and DWI in the same patient cohorts with focal liver lesions, using histopathology or validated imaging follow-up as the reference standard. Sensitivity, specificity, and area under the curve (AUC) were pooled using bivariate random-effects models, with paired analysis to compare modalities. Results219 patients with 284 focal liver lesions were analyzed. MRE demonstrated higher pooled sensitivity (93.8%, 95% CI: 85.6-97.5) and specificity (89.9%, 95% CI: 74.6-96.4) than DWI (sensitivity 86.2%, 95% CI: 80.5-90.5; specificity 83.4%, 95% CI: 74.3-89.8). MRE also had a higher AUC (0.97 vs. 0.88). Likelihood ratio analysis indicated MREs stronger ability to both confirm and exclude malignancy. Paired meta-analysis confirmed a statistically significant increase in sensitivity for MRE (relative sensitivity 1.09; p = 0.018), with no significant difference in specificity. ConclusionMRE demonstrates superior sensitivity and overall diagnostic accuracy compared to DWI for differentiating benign and malignant FLLs. Further large-scale prospective studies are needed to confirm these results and determine optimal cutoff values to guide clinical decision-making.

11
Predicting intervertebral disc degeneration using Pyradiomics features and XGBoost classification

Muftuler, L. T.; Drobek, A.; Bukowy, J. D.; Duwe, K.; Sudersanam, V.; Harrington, J.; Van Zant, E.; Duenweg, S. R.; Shanbhag, D. D.

2026-01-15 radiology and imaging 10.64898/2026.01.13.26343807
Top 0.5%
57× avg
Show abstract

BackgroundDisc degeneration is the primary cause of low back pain, although the disc itself is not usually the source of the pain. Instead, it can lead to various clinically significant conditions that cause pain. However, there are no objective measures of the disc degeneration. PurposeLack of objective measures of disc degeneration may sometimes cause uncertainties in treatment decisions. Currently disc degeneration is graded by visual assessment of MRI, which often leads to uncertainty and disagreements. Therefore, the objective of this study was to develop a simple, efficient, accurate, and objective diagnostic tool for assessing disc degeneration. Study typeProspective (data acquired on site) and retrospective (data from online repository). PopulationLumbar spine MRI data from 277 participants are used. 208 of those were from an online repository and 69 were from our site. Field strength/Sequence3.0T; T2 weighted 2D and 3D fast spin echo pulse sequences. AssessmentA fully automated method is implemented where selected radiomics features are calculated from T2 weighted MRI and used for classification of the disc degeneration grade. Binary disc masks are generated using nnU-Net and radiomics features are extracted using Pyradiomics. Optimal preprocessing approaches are explored to obtain reliable feature calculations from repeated scans. Several advanced decision tree classification methods were also tested. Statistical testsF1 accuracy score, Area Under the Curve, confidence interval. ResultsXGBoost was in good agreement with the rater and the important features used in classification were in accord with expected changes in discs. Data conclusionAutomated evaluation of disc degeneration streamlines the physicians workflow and reduces uncertainties. Using radiomics features enables explainability and provides simple and robust training for machine learning approaches. Level of evidence2 Technical Efficacy3

12
An Exploratory Study of ResNet and Capsule Neural Networks for Brain Tumor Detection in MRI

Mensah, S.; Atsu, E. K. A.; Ammah, P. N. T.

2026-02-09 radiology and imaging 10.64898/2026.02.05.26345460
Top 0.5%
57× avg
Show abstract

Brain tumors are one of the most life-threatening diseases, requiring precise and timely detection for effective treatment. Traditional methods for brain tumor detection rely heavily on manual analysis of MRI scans, which is time-consuming, subjective, and prone to human error. With advancements in deep learning, Convolutional Neural Networks (CNNs) have become popular for medical image analysis. However, CNNs are limited in their ability to capture spatial hierarchies and pose variations, which reduces their accuracy, particularly for tasks like brain tumor segmentation where precise spatial relationships are crucial. This research introduces a hybrid Capsule Neural Network (CapsNet) and ResNet50 model designed to overcome the limitations of traditional CNNs by capturing both spatial and pose information in MRI scans. The proposed model leverages ResNet50 for feature extraction and CapsNet for handling spatial relationships, leading to more accurate segmentation. The study evaluates the model on the BraTS2020 dataset and compares its performance to state-of-the-art CNN architectures, including U-Net and pure CNN models. The hybrid model, featuring a custom 5-cycle dynamic routing algorithm to enhance capsule agreement for tumor boundaries, achieved 98% accuracy and an F1-score of 0.87, demonstrating superior performance in detecting and segmenting brain tumors. This study pioneers the systematic evaluation of the ResNet50 + CapsNet hybrid on the BraTS2020 dataset, with a tailored class weighting scheme addressing class imbalance, improving effectiveness in identifying irregularly shaped tumors and smaller regions in identifying irregularly shaped tumors and smaller tumor regions. The study offers a robust solution for automating brain tumor detection. Future work will explore the use of Capsule Networks alone for brain tumor detection in MRI data and investigate alternative Capsule Network architectures, as well as their integration into clinical decision support systems.

13
Deep Neural Patchworks Predict Renal Imaging Biomarkers from Non-Contrast MRI via Knowledge Transfer from Arterial-Phase Contrast-Enhanced MRI

Kästingschäfer, K. F.; Fink, A.; Rau, S.; Reisert, M.; Kellner, E.; Nolde, J. M.; Kottgen, A.; Sekula, P.; Bamberg, F.; Russe, M. F.

2026-02-26 radiology and imaging 10.64898/2026.02.24.26346961
Top 0.6%
55× avg
Show abstract

Rationale and ObjectivesContrast-enhanced (CE) MRI provides clear corticomedullary contrast for renal compartment delineation but may be contraindicated or undesirable in routine practice. We aimed to enable automated extraction of renal imaging biomarkers from routine non-contrast-enhanced (NCE) T1-weighted MRI by transferring CE-derived compartment labels. Materials and MethodsThis retrospective single-center study (January 2017 to December 2021) included 200 participants with paired arterial-phase CE and NCE T1-weighted MRI. Cortex, medulla, and sinus were manually segmented on CE MRI and rigidly transferred to NCE MRI to provide voxel-level reference labels. A hierarchical 3D Deep Neural Patchworks model was trained on 100 examinations (90 training/10 validation) and evaluated on an independent test set of 100 examinations using the transferred CE masks on NCE as reference. Performance was assessed using Dice similarity of segmentations and biomarker agreement using volumes and surface areas (Pearson/Spearman, MAE, Lins CCC, and Bland-Altman). ResultsWhole-kidney segmentation Dice was 0.950 (left) and 0.953 (right). Total kidney volume showed high agreement with minimal bias (MAE 8.76 mL, 2.5% of mean; CCC 0.983; bias -1.56 mL; 95% limits of agreement -28.81 to 25.69 mL). Cortex volume was modestly overestimated and medulla volume underestimated, shifting predicted compartment fractions toward cortex (74.7% vs. 72,1% in ground truth; medulla 21.5% vs. 24.3%; sinus 3.8% vs. 3.6%. Sinus volume maintained high concordance despite higher Dice dispersion. Surface area was systematically underestimated with low concordance. ConclusionCE-supervised knowledge transfer enables accurate, well-calibrated kidney volumetry from routine NCE MRI and supports contrast-free renal biomarker extraction. Surface area estimation remains challenging. Take-home MessagesO_LICE-supervised label transfer enables accurate, well-calibrated contrast-free kidney volumetry on routine non-contrast T1-weighted MRI. C_LIO_LICompartment volumetry is feasible but shows systematic cortex overestimation and medulla underestimation; surface area remains non-interchangeable due to boundary uncertainty. C_LI

14
Quality versus quantity of training datasets for artificial intelligence-based whole liver segmentation

Castelo, A.; O'Connor, C.; Gupta, A. C.; Anderson, B. M.; Woodland, M.; Altaie, M.; Koay, E. J.; Odisio, B. C.; Tang, T. T.; Brock, K. K.

2026-02-18 radiology and imaging 10.64898/2026.02.17.26346486
Top 0.6%
55× avg
Show abstract

Artificial intelligence (AI) based segmentation has many medical applications but limited curated datasets challenge model training; this study compares the impact of dataset annotation quality and quantity on whole liver AI segmentation performance. We obtained 3,089 abdominal computed tomography scans with whole-liver contours from MD Anderson Cancer Center (MDA) and a MICCAI challenge. A total of 249 scans were withheld for testing of which 30, MICCAI challenge data, were reserved for external validation. The remaining scans were divided into mixed-curation and highly-curated groups, randomly sampled into sub-datasets of various sizes, and used to train 3D nnU-Net segmentation models. Dice similarity coefficients (DSC), surface DSC with 2mm margins (SD 2mm), the 95th percentile of Hausdorff distance (HD95), and 2D axial slice DSC (Slice DSC) were used to evaluate model performance. The highly curated, 244-scan model (DSC=0.971, SD 2mm=0.958, HD95=2.98mm) performed insignificantly different on 3D evaluation metrics to the mixed-curation 2,840-scan model (DSC=0.971 [p>.999], SD 2mm=0.958 [p>.999], HD95=2.87mm [p>.999]). The 710-scan mixed-curation (Slice DSC=0.929) significantly outperformed the highly curated, 244-scan model (Slice DSC=0.923 [p=0.012]) on the 30 external scans. Highly curated datasets yielded equivalent performance to datasets that were a full order of magnitude larger. The benefits of larger, mixed-curation datasets are evidenced in model generalizability metrics and local improvements. In conclusion, tradeoffs between dataset quality and quantity for model training are nuanced and goal dependent.

15
The Effects of External Laser Positioning Systems for MRI Simulation on Image Quality and Quantitative MRI Values

McCullum, L.; Ding, Y.; Fuller, C. D.; Taylor, B. A.

2026-03-07 radiology and imaging 10.64898/2026.03.06.26347809
Top 0.6%
50× avg
Show abstract

Background and Purpose: Magnetic resonance imaging (MRI) for radiation therapy treatment planning is currently being used in many anatomical sites to better visualize soft tissue landmarks, a technique known as an MRI simulation. A core component of modern MRI simulation configurations are the use of external laser positioning systems (ELPS) to help set up the patient. Though necessary for accurate and reproducible patient setup, the ELPS, if left on during imaging, may interfere negatively with image quality due to leaking electronic noise, of which MRI is sensitive to. It is currently unknown whether this leakage of electronic noise may further affect quantitative values derived from clinically employed relaxometric, diffusion, and fat fraction sequences. Therefore, in this study, we aim to characterize the impact of MRI simulation lasers on general image quality and quantitative imaging accuracy. Materials and Methods: First, a cine acquisition was used to visualize the real-time changes in image signal-to-noise ratio (SNR) from when the ELPS was deactivated to activated. To validate this effect quantitatively, the SNR was measured using the American College of Radiology (ACR) recommended protocol in a homogeneous phantom with the integrated body, 18-channel UltraFlex small, 18-channel UltraFlex large, 32-channel spine, and 16-channel shoulder coils. Next, a geometric distortion algorithm was tested in two vendor-provided phantoms while using the integrated body coil and the ACR Large Phantom protocol was tested. Finally, a series of quantitative MRI scans were performed using a CaliberMRI Model 137 Mini Hybrid phantom to validate quantitative T1, T2, and ADC while a Calimetrix PDFF-R2* phantom was used for quantitative PDFF and R2*. All scans were performed with both the ELPS both deactivated and activated. Results: Visible electronic noise artifacts were seen when using the integrated body coil when the ELPS was activated on the cine acquisition which led to a four-fold decrease in SNR using the ACR protocol. This SNR drop was not seen when using the remaining tested coils. The automatic fiducial detection algorithm was affected negatively by ELPS activation leading to misidentification when identified perfectly with the ELPS deactivated. Degradation in image intensity uniformity, percent signal ghosting, and low contrast object detectability was seen during ACR Large Phantom testing using the 20-channel Head/Neck coil. Concordance across quantitative MRI values was similar when the ELPS was both deactivated and activated while a consistent increase in standard deviation inside the ADC vials was seen when the ELPS was activated. Discussion: The extra noise induced from the activation of the ELPS during imaging should be avoided due to its potential to unnecessarily increase image noise. This is particularly true when conducting mandatory quality assurance testing for image quality and geometric distortion which utilize the integrated body coil which is most susceptible to ELPS-induced noise. Clear clinical guidelines should be implemented to make this issue known to the MRI technologists, physicists, and other relevant staff using an MRI with a supplementary ELPS for patient alignment.

16
Dental teachers perspectives on Extended Reality in dental education: an international survey

Bjelovucic, R.; de Freitas, B. N.; Norholt, S. E.; Taneja, P.; Terp Hoybye, M.; Pauwels, R.

2026-03-05 dentistry and oral medicine 10.64898/2026.03.05.26347677
Top 0.6%
49× avg
Show abstract

IntroductionDigital technologies are reshaping how health professionals are trained, and extended reality (XR) has gained attention as a tool for skills development in dental education. Yet, successful integration depends largely on educators perceptions, readiness, and working conditions. This study aimed to explore dental educators views of the educational value of XR, what barriers they experience, and how familiarity with immersive technologies relates to their use in teaching. Materials and MethodsA cross-sectional, web-based survey was conducted among dental educators. The questionnaire included items on demographics, familiarity and frequency of XR use, and perceptions of educational value, barriers, and curricular integration. Descriptive statistics were calculated, and Spearman correlation analyses were performed to explore associations between familiarity, use, and perceived benefits of XR. ResultsRespondents reported positive attitudes toward XR, particularly for improving students understanding of complex anatomy (mean = 6.02/7), skill development (5.68/7), and confidence and preparedness for clinical practice (5.08-5.20/7). XR was mainly viewed as a complement to traditional teaching rather than a replacement (mean = 3.77/7). Strong correlations were observed between perceived improvements in confidence, skills, and clinical readiness (r = 0.71 - 0.89, P < 0.0001). High costs, limited technical support, and time constraints were the most prominent barriers to usage. ConclusionOverall, dental educators appear open to XR but constrained by structural and organizational factors rather than a lack of interest. Faculty development, hands-on training opportunities, and institutional support may therefore be essential to translating positive perceptions into meaningful and sustained integration of immersive technologies in dental curricula.

17
Clinical validation of automated and multiple manual callosal angle measurement methods in idiopathic normal pressure hydrocephalus

Seo, W.; Jabur Agerberg, S.; Rashid, A.; Holmstrand, N.; Nyholm, D.; Virhammar, J.; Fallmar, D.

2026-02-14 radiology and imaging 10.64898/2026.02.12.26346185
Top 0.7%
48× avg
Show abstract

IntroductionIdiopathic normal pressure hydrocephalus (iNPH) is a partially reversible neurological disorder in which imaging biomarkers support diagnosis and surgical decision-making. The callosal angle (CA) is one of the most robust radiological markers of iNPH and has also been associated with postoperative shunt outcome. However, several manual measurement variants exist and artificial intelligence (AI)-based tools now enable automatic CA measurement. Materials and MethodsIn total 71 patients (40 with confirmed iNPH and 31 controls) were included. Six predefined manual methods for measuring CA were applied to preoperative 3D T1-weighted MRI and evaluated for diagnostic performance and interobserver agreement. An AI-derived automatic CA (cMRI from Combinostics) was included as a seventh method and compared with the traditional manual method (perpendicular to the bicommissural plane and through the posterior commissure). Automatic measurements were additionally assessed in pre- and postoperative scans to evaluate robustness against shunt-related artifacts. ResultsAll seven CA variants significantly differentiated iNPH patients from controls (p < 0.05). The traditional method showed the highest discriminative performance (AUC = 0.986, SE = 0.012), while alternative planes demonstrated slightly lower accuracy (AUC range = 0.957-0.978). Interobserver agreement for manual measurements was good to excellent (ICC = 0.687-0.977). Automatic CA measurements showed excellent correlation with the traditional method, preoperative ICC = 0.92; postoperative ICC = 0.96. ConclusionAlthough several CA positions perform comparably, the traditional method remains marginally superior and is best supported by the literature. Automated CA measurements closely match expert manual assessment in pre- and postoperative imaging, supporting clinical implementation.

18
Double-Bowtie Filter Design for Pediatric Spectral CT Imaging

Ge, Y.; Sandvold, O. F.; Proksa, R.; Perkins, A. E.; Koehler, T.; Brown, K. M.; Jin, Y.; Daerr, H.; Manjeshwar, R. M.; Noël, P. B.

2026-01-16 radiology and imaging 10.64898/2026.01.15.26344121
Top 0.7%
43× avg
Show abstract

PurposeTo develop and evaluate a novel double bowtie filter integrating a K-edge material layer with a conventional Teflon filter for pediatric spectral computed tomography (CT). The proposed design aims to enhance spectral signal-to-noise ratio (SNR) and spectral separation while maintaining radiation dose levels suitable for pediatric imaging. MethodsA simulation framework was set up and used to model a rapid kVp-switching CT system operating at 70/110 kVp with realistic tube power and geometry constraints. Pediatric phantoms of three sizes (100- 200 mm anterior-posterior width) were used to evaluate performance. Five accessible and safe filter materials-gadolinium (Gd), holmium (Ho), erbium (Er), silver (Ag), and tin (Sn)-were tested in combination with a Teflon bowtie. System performance was quantified using virtual monoenergetic image (VMI) SNR at 40 keV and 70 keV, and the area under the monoenergetic SNR curve (AUMC) as a comprehensive spectral image quality metric. Dose consistency with a traditional Teflon bowtie reference was enforced. ResultsThe Teflon + Gd configuration achieved the highest performance, improving AUMC by 47.5 % on average and up to 56 % for the largest phantom. VMI SNR increased by approximately 49 % at 40 keV and 42 % at 70 keV. ConclusionsThe double-bowtie concept substantially enhances spectral performance. The Teflon + Gd design provides a manufacturable, pediatric-optimized solution adaptable to kVp-switching and other spectral CT architectures, offering improved diagnostic quality at low dose levels.

19
Clinical Evaluation of a Novel Deep Learning-Based Auto-Segmentation Software: Utility and Potential Pitfalls

Tozuka, R.; Saito, M.; Matsuda, M.; Akita, T.; Nemoto, H.; Komiyama, T.; Kadoya, N.; Jingu, K.; Onishi, H.

2026-01-11 radiology and imaging 10.64898/2026.01.08.26343652
Top 0.8%
40× avg
Show abstract

BackgroundAccurate contouring of target volumes and organs at risk is critical for radiotherapy. While deep learning (DL) models offer efficient automation, their generalizability to real-world clinical cases containing anatomical variations and artifacts requires rigorous validation. PurposeTo evaluate the clinical accuracy and robustness of RatoGuide, a novel DL-based auto-segmentation software, using a dataset derived from routine clinical practice including atypical cases. MethodsThis single-center retrospective study included 36 patients treated for head and neck, thoracic, abdominal, and pelvic cancers. The cohort was intentionally selected to encompass diverse anatomies and artifacts (e.g., pacemakers, artificial femoral head replacement). Auto-contours generated by RatoGuide were compared with expert-approved manual contours. Performance was evaluated quantitatively using the Dice Similarity Coefficient (DSC) and 95th percentile Hausdorff Distance (HD95), and qualitatively via a 5-point visual assessment scale (higher is better) by four independent reviewers. A score of [&le;]2 by multiple reviewers was defined as failure. ResultsOverall, the mean DSC, HD95, and visual assessment score were 0.79 {+/-} 0.19, 6.35 {+/-} 12.2 mm, and 3.65 {+/-} 0.88, respectively. The mean DSC exceeded 0.8 in 62% (23/37 organ structures) of the evaluated structure types, and a total of 93.5% (315/337) of all contours were considered clinically acceptable based on visual evaluation . However, lower performance was observed in small structures (e.g., optic chiasm) and low-contrast organs (e.g., esophagus). ConclusionsRatoGuide demonstrated favorable performance for major organs across various anatomical regions, consistent with benchmarks reported in the literature. However, performance variability in atypical cases underscores the necessity of rigorous visual verification by experts for clinical implementation.

20
DBT-2026, a de-identified publicly available dataset of digital breast tomosynthesis exams with ground truth biopsies

Wu, J.; Perandini, L.; Batra, T.; Igoshin, S.; Bari, S.; de Araujo, A. L.; Willemink, M. J.

2026-03-04 radiology and imaging 10.64898/2026.03.03.25337924
Top 0.8%
39× avg
Show abstract

Digital breast tomosynthesis (DBT) is a powerful imaging modality that allows for improved lesion visibility, characterization, and localization compared to conventional two-dimensional digital mammography. DBT has been increasingly adopted in screening and diagnostic settings globally, particularly for women with dense breast tissue where tissue overlap presents a significant diagnostic challenge. Here we describe DBT-2026, a real world imaging dataset with 558 DBT exams from 558 patients with breast imaging reporting and data system (BI-RADS) scores of 0, 1, or 2. Each case contains one DBT examination in combination with expert annotations and free-text radiology reports that describe the radiological findings, produced in routine clinical practice. To protect patient privacy, all images and reports have been de-identified. The dataset is made freely available to researchers for non-commercial projects to facilitate and encourage research in breast cancer imaging.